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1.   Introduction 

In  a  Matrix-Differential  Game,  (MDG)  a  finite  payoff  matrix  A   is 
given,  and  player  I  (II)  tries  to  maximize  (minimize)  the  payoff  by  picking 
the  proper  row  (column) .   However,  the  choices  of  the  two  players  are  not 
made  simultaneously,  so  that  the  conventional  result  that  the  game  value  is 
obtained  when  the  two  players  use  optimal  mixed  strategies  does  not  hold. 
Instead,  a  number   o  ,   with  0  ^  a  ^  1  ,   is  given,  with  the  interpretation 
that   a   is  the  relative  reaction  time  of  player  I.   The  elements  of   A 
are   interpreted  as  payoff  rates,  with  the  payoff  being  at  the  rate   a(i,j) 
for  a  time   o   if  player  II  has  just  picked  column   j   (after  which  player 
I  picks  a  new  row) ,  or  for  a  time   1-a   if  player  I  has  just  picked  row   i 
(after  which  player  II  picks  a  new  column) .   Play  is  assumed  to  go  on 
indefinitely,  and  the  payoff   fl(o)   is  defined  to  be  the  long  run  average 
of  the  payoff  rate.   The  words  "long  run"  are  of  course  relative  to  the 
length  of  the  time  unit. 

The  idea  of  an  MDG  was  invented  by  Danskin  [1],  who  observed  that  the 
ability  to  solve  MDG ' s  is  a  prerequisite  for  being  able  to  solve  for  the 
ft-value  of  a  differential  game,  where  the  payoff  matrix  of  the  MDG  would 
be  a  Hamiltonian  function  involving  the  equations  of  motion.   In  fact,  it 
is  non-separable  differential  games  that  provide  most  of  the  interest  in 
MDG's,  since  it  can  be  shown  that  optimal  strategies   for  playing  MDG's 
do  not  involve  choosing  random  numbers.   The  spectre  of  the  two  players 
trying  to  choose  independent  random  numbers  continuously  in  a  differential 
game  has  given  rise  to  the  ^-formulation  (along  with  several  other  [2], 
[3],  and  [A]),  where  the  need  for  mixed  strategies  is  eliminated  by  having 
the  players  "take  turns"  in  some  sense  or  other. 


Formally,  an  N  x  M  MDG  is  a  stochastic  game  with  perfect  information, 
2MN   positions  (the  position  depends  on  whose  move  it  is,  hence  the  factor 
of  2),  and  non-zero  stop  probabilities.   The  existence  of  the   ^-value 
and  of  stationary  optimal  strategies  has  therefore  been  proved  by  Gillete 
in  his  paper  on  the  subject  [5],   (His  proof  contained  an  error  that  was 
subsequently  corrected  by  Liggett  and  Lippman  [6]).   Our  Theorem  1  goes  on 
to  state  that  the   Q-value  and  the  optimal  stationary  strategies  .must 
satisfy  certain  "equilibrium  equations"  (2.6  and  2.7)  that  form  the  basis 
of  the  computational  procedure  discussed  in  chapter  3.   Theorem  2  shows 
that  our   fl-value  is  the  same  as  Danskin's,  and  Theorem  3  establishes  some 
elementary  properties  for  the  ft-value  of  a  MDG.   These  three  theorems, 
together  with  the  computational  procedure  of  chapter  3,  are  the  content 
of  this  technical  report. 


2 .   The  Equilibrium  Equations 

Throughout  this  report  the  symbol   A  =  [a(i,j)]   will  denote  an 
N  x  M  matrix  with  real  entries  a(i,j)  ;   i  =  1,...,N  ;   j  =  1,...,M. 
We  further  reserve  the  letters   I   and   J   to  denote  the  sets  of  row  and 
column  indices,  respectively,  i.e. 

I  =  {1,.. .,N}  ,   J  =  {1, .. . ,M}  . 
The  letter  a  will  always  denote  a  real  number  from  the  interval   [0,1]  . 

Next  let   P  be  the  set  of  all  mappings  from  J   to   I  ,  and  let 
Q.  be  the  set  of  all  mappings  from  I   to   J  .   A  sequence 

P  =  {pv  :  pveP,  v=l,2,...} 
or  a  sequence 

Q  =  {qv  :  qve£,  v=l,2,...} 
will  be  referred  to  as  response  strategy  of  player   1  or  2   respectively. 

OO  00 

The  sets  of  all  response  strategies  will  be  denoted   P   and  <2   •   A  response 
strategy   P   or   Q   such  that   p   =  p   =  ...  =  p   or  such  that  q   =  q~  =  . . .  =  q 
will  be  called  stationary. 

Let   P  =  lp  }eP   and  Q  -  iq  leQ.   be  a  pair  of  response  strategies,  and 
let  n  be  a  positive  integer.   With  any  such   pair  (P  ,   Q)  we  now  associate 
a  quantity  H  (P,Q|i  )  ,   called  the  n-stage  payoff  given  the  row-predecessor 

i  el  ,   and  defined  by 

o  J 


1    n 
(P,Q I i  )  =  -   )  aa(i  1 , j  )  +  (l-a)a(i  ,j  )  , 
n   'X|  o    n   L*  v-l,JV  v'Jv 


where   j   =q(i   -),   i   =  p  (j  )  ,   v=  1>2,... 
V     V   V-±       V     V   V 

Similarly,  the  n-stage  payoff  given  the  column-predecessor   j  eJ  is  defined  by 

n 


Hn(P'Qljo}  =  i  l    (1-0)8(^.1^)  +  aa(iv,  jv) 


v=l 


where  this  time 


\   =  pv(jv-l}  »   jv  =  qv(V  '   v  =  1'2-" 


oo   „oo 


The  triplet  G(a)  =  (P  ,  Q_  ,  A)   with  the  sequences  of  payoffs   H   defined 
as  above  will  be  referred  to  as  matrix-differential  game.   (MDG) 
We  are  now  ready  to  state  the  basic  definition. 

00  00 

Definition;   If  there  exist  a  pair  of  response  strategies  P*   e  P  ,  Q*  e  Q. 

and  a  real  number   fi(a)   such  that  for  any   i  el      or   j  eJ 

J         o        Jo 

iiS  Hn(P*,Q*|io)  =  1&  Hn(P*,Q*|jo)  =  fi(a)  ,  (2.1) 

00  00 

and  for  any   PeP   ,   QeQ. 

l^m  sup  Hn(P,Q*|io)   <   fi(a)  ,  (2.2) 

lim  sup  H  (P,Q*|j  )   <   fi(a)  ,  (2.3) 

n-*00      n     '  o 

1^  inf  Hn(P*,Q|iQ)   >  fi(a)  ,  (2.4) 

Hm  inf  H  (P*,Q|j  )   >  fi(a)  ,  (2.5) 

n^-^       n        o 

then  fi(a)   is  called  the  omega-value  and  P*  and  Q*   optimal  response 

strategies  of  the  MDG  G(a). 

Lemma:   Let   oo,  x,,...,  x  ,  y_  , .  . .  ,y   be  a  solution  of  the  system  of 
— — —  1       n   1      m 

N  +  M   equations  : 

x  .  +  00  =  min[ca(i,j)  +  y.]  ,   iel  ,  (2.6a) 

1  JeJ  3 

y.  +  oj(l-a)  -  max[(l-a)a(i,j)+x.]  ,   jeJ  ,  (2.6b) 

2  iel  X 

let   p*eP  and   q*eQ.  be  such  that 

x±  +  oja  =  aa(i,q*(i))   +  y  A(i)  ,   iel  ,  (2.7a) 

y  +  w(l-a)  =  (l-a)a(p*(j),j)  +  x  *(  }  ,  jeJ.  (2.7b) 

Then   (2.1)   through   (2.5)   hold  with  fi(a)  =  u>  and  stationary 

P*  =  {p*,p*,...}   ,   Q*  =  {q*,q*,...}  . 


Proof:   We  begin  with  the  inequality   (2.2)  .   By   (2.6b)   we  have  for 
every   k  e  I  ,   j  e  J 

y,  +  u)(l-a)  >  (l-a)a(k.j)  +  x, 

so  that  in  particular  for  j  =  q*(i) 

Yq*(i)  +  a)(1-a>  "  (l-a)a(k,q*(i))  +  xk  . 

Substituting  for   y  .  ,.N  into   (2.7a)   this  becomes 

yq*(i) 

x.    +  coa   >   aa(i,q*(i))  +    (l-a)a  (k,q*(i) )    +  x      -  oo(l-a) 

1  K 


or 


oo   >  aa(i,q*(i))    +    (l-o)a(k,q*(i))    +  xR   -   x±    ,  (2.8) 

for  every      iel    ,      kel    .      Let      P   =   {p   jeP      , 

i   el     be   arbitrary,    let 
o 

i      =   q*(i      ,)    ,      i      =   P    (j    )    ,      Vs8    1,2,... 
Jv  v-1  v  v     v 

Substituting   i  =  i     ,   k  =  i   into   (2.8)   and  averaging  over 

v  =   1 n     we   obtain 

1      n 

oo  >  H    (P,Q*|i   )   +  -     7  x.    -  x. 

n  '    o  n     L  -    l  ,        l      n 

v=l     v  v-1 


—     Jx.-x.         I  =  —  I    x.    -x.     Is     —  maxijx.} 
n     .  ,    l  l. .    J       nl      l        l    I  n    .    _  ^    i 


n 

v=l      v  v-1  no  iel 


which    tends    to    zero    as      n  -*■  °°    . 

Hence      (2.2)      holds   with  ft(a)    =      oo     and      Q*   =    {q*,q*,...} 

To  prove   (2.3)   let  this  time  j  eJ   arbitrary  , 

i  =  P  (j   J  ,   i   =  q*(i  )  ,   V  -  1,2,.  .. 

V     V   V-1  V    n    V 

and  substitute  k  =  i   ,  i  =  i  in   into   (2.8)  . 

v  '  v+1 

Averaging  over  v  =  l,...n  we  obtain 


1  1  n 

w  *  Hn(P,Q*|jo)  -  -[(l-o)a(i   jQ)  +  Oa(i  ,j  )]  +  -  [  x  -  x     , 

V=l   V     v+l 

the  last  two  terms  again  tending  to  zero  as  n  ->■  °°  . 

Inequalities   (2.4)  and   (2.5)   follow  from   (2.6a)   and   (2.7b) 
in  analogous  manner. 

Finally  to  establish   (2.1)   we  have  from   (2.7a,b) 
0)  =  aa(i,q*(i))  +  (l-a)a(p*(q*(i)),q*(i))  +  xp*(q*(i))  "  XA  •   iel  » 

from  which  by  setting   i  =  i  =  P*(j  )  ,   Jv  ■  q*(i   j)  ,V  =  1,2,... 

we  have 

co  =  H  (P*,Q*|i  )   +  -  (x.   -  x.)  , 
n    'X|o      n   l     l 

n     o 

and   (2.1)   follows.   The  lemma  is  proved. 

Remark  1;   Investigation  of  the  preceeding  proof  reveals  that  a  slightly 

stronger  statem  can  be  made.   In  fact  we  proved  that  there  exists  a  constant 

OO  00 

C  <  °°  such  that  for  any   Pe  P  ,  QeQ 

H  (P,Q*|i  )  <  fi(a)  +  -  ,  (2.2') 

n      '  o  n 

Hn(P*,Q|io)  >  fl(a)  -  £  ,  (2.4') 

for  every   n  =  1,2,...,  and  similarly  for  a  column  -  predecessor   j  . 
This  also  implies  that  the  convergence  in   (2.1)  is  of  order  0(l/n)  . 
With  the  aid  of  the  lemma  we  now  prove  the  main  theorem  about 
MD  games . 

Theorem  1;   Every  MD  game  G(a)  ,   ae[0,l]   has  an  omega-value  and  each 
player  has  a  stationary  optimal  response  strategy.   Further,  for  every 
0e[O,l]   the  omega-value  Q(o)      is  the  unique  number   co  satisfying  the 
system   (2.6)   and   P*  =  {p*}  ,   Q*  =  {q*}  ,   where   p*   and  q*   satisfy 
(2.7a)   and   (2.7b)   respectively,  are  stationary  optimal  response  strategies. 


Proof:   In  view  of  the  previous  lemma  and  the  fact  that,  according  to  its 

definition,  the  omega-value  must  be  unique,  the  theorem  will  follow  as 

long  as  we  prove  that  the  system   (2.6)   has  always  a  solution. 

To  this  end  notice  first  that  if  x......  x   is  a  solution  of 

1       n 

the  system  of   N   equations: 

x,  =  min  max  [Ga(k,j)  +  (1-C)a(i,j)  +  x.] 

jeJ  iel  X  (2.9) 

1  N 
-  —  1     min  max[Oa(£,j)  +  (l-a)a(i,j)  +  x.]  ,  kel  , 

£=1  jeJ  iel  X 

then  a)  ,   x1,...,  x  ,  y^ . ..,  yM  ,  where 

1  N 
w  =  n  1     mi-n   max[aa(£,j)  +  (l-a)a(i,j)  +  x.]  , 

1=1   jeJ  iel  X 

y.  =  max  [(l-a)a(i,j)  +  x  ]  -  cu(l-a)  ,  jeJ  , 
J    iel 

is  a  solution  of  the  system   (2.6)  . 

Call  temporarily   f  (x  , . . . ,x  )  ,   kel   the  right-hand  side  of  the  k-th 

K,    _L         IN 

equation   (2.9)  .   Since 

1   N 
|fk(x    ,...,x   )|    <  -  ^  min  max    [aa(k,j)    +    (l-a)a(i.j)    +  x±] 

1=1        jeJ    iel 

-  min  max    [aa(£,j)    +    (l-a)a(i,j)   +  x.]| 
jeJ    iel 

1      N 

<  —     J     max|max    [aa(k,j)   +    (l-a)a(i,j)    +  x.] 

1=1  jeJ    iel  X 

-  max    [aa(£,j)   +   (l-a)a(i,j)   +  x.]| 
iel  X 

1     N 

<  -     I     max  max| [aa(k,j)   +  (l-a)a(i,j)]   -    [aa(£,j)  +   (l-a)a(i, j) ] | 

1=1   jeJ    iel 

N 

<  -     I      max|a(k,j)    -   a(£,j)|    <    20    \\  A    ||    ,      where 

1=1   jeJ 


A   I    =  max  max    |a(i,j)|     , 
iel   jeJ 

N 
and   since   clearly        I    f,  (x    ,  ...,x   )    =  0    , 

k=l 

the  vector-valued  function   f_  =  (f  ,...,f  )   maps  the   (N-l)-dimensional 


hypercube 


N 


C   =    {(x    ,...,x    )    :    I      x.    =   0    ,      max|x.|    <    2a||  A    j|} 

i=l  iel 


into  itself.   Next  for  any  kel 

I  k   l'**',XN   ~   k   1  '*'*'XN 


mm  max 
jeJ  iel 


[aa(i,j)  +  (l-a)a(k,j)  +  x  ]  -  min  max[aa(i,j) 

jeJ  iel 


N 


+  (l-a)a(k,j)  +  x']   +  ^  I 

1=1 


mm  max 
jeJ  iel 


[aa(£,j)  +  (l-a)a(i,j)  +  x.']-  min  max  [aa(£,j)  +  (l-a)a(i,j)  +  x.] 

1    jeJ  i£l  1 


<  max 
iel 


x  .-x. 
l   l 


1  N 
+  -  I      max 

£=1  iel 


x.  -x . 
l    l 


=  2  max 
iel 


x.-x. 

1   1 


Hence  the  function  _f   is  also  continuous  and  therefore,  by  Brouwer  Fixed 

Point  Theorem  there  exists   (xn,...,x.T)  e  C   such  that 

1      N 

x,  =  r,  (x  ,  .  .  .  j3^.)  >   kel. 

Thus  the  system   (2.9)   and  consequently  the  system   (2.6)   has  always  a 
solution  and  the  theorem  is  proved. 

John  Danskin  defined  originally  the  omega-value  of  a  MD  game  as 
a  limit  of  ordinary  pure  values  of  a  sequence  of  games  with  perfect  in- 
formation  ([2],  see  also  [3]).   The  next  theorem  shows  that  Danskin' s 
definition  and  the  one  used  in  this  paper  are  equivalent. 


Theorem  2:   If  Q(o)      is  the  omega-value  of  a  MD-game   G(a)   then  for 


every      1   fl 
J        o 

n 


fi(a)    =    lim   min   max    . . .    min  max     —      [   Oa    1  ,    .,1  ,      +    (l-O)a(i    , i    )     , 

n-*»  j.eJ    i.el        j    eJ    1    el        v=l  _„    ,rt. 

11  Jn  n  (2.10) 

and  for  every   i  eJ 

J      Jo " 

1   n 
Q(a)    =      lim  max  min  ...  max  min   -  7  (l-a)a(i  ,  i   ,  )  +  aa(i  ,  i  ) 

n-i.I     JlCj        idjd     n  v=l  ^V-l 

11  n  n  (2.11) 

Proof :      Denote    temporarily 

1      v 
W    (i    ,j,,i,,...,j    ,i    )    =—      >    aa(i      ,,j    )    +    (l-a)a(i    ,j    ). 
nv    o'Jl'    1'         ,Jn     n  n      L,  v-l'Jv  v'Jv 


Let      p*eP     and      q*e 0     be   as    in      (2.7)    .      By      (2.2')      and      (2. A')      of 
Remark   1  we  have   for    every      n   =    1,2,... 

Wn(io,q*(io),i1,q*(i1),...,q*(in_1),in)    <    fi(a)    +~  (2.12) 

for  any  sequence  of  row  indices   i  , i,,...  ,   and 

Wn(io,J1,p*(J1),...,Jn,p*(Jn))  >  0(a)  -  C/n  ,  (2.13) 

for  any  sequence  of  column  indices   j-i»j«t«««   and  any   i  el  . 

From   (2.12)   we  obtain  successively 

max  Wn(iofq*(i0),...,q*(in_1)in)  <  fl(0)  +  £ 

n 

Q 

min  max  W  (i  ,q*(i  ),...,i   ,,i  ,i  )  <  ft(a)  +  - 
i   n  o  ^   o       n-1  Jn  n  n 

Jn   n 


Q 

min  max  .  .  .  min  max  W  (i  ,  i  -,,...,  i  ,i  )  <  fi (a)  +—  . 

n   o   1      n  n  n 

Jl  Xl      Jn  \ 
Similarly  from   (2.13)   we  obtain  eventually 

Q 

min  max  .  .  .  min  max  W  (i  ,  i .,,...,  i  ,i  )  >  S2(a)  +  - 

n   o  J 1      n  n  n 

Jl   Xl     Jn  \ 
and   (2.10)   follows.   (2.11)  is  proved  in  the  same  fashion. 


To  the  end  of  this  section  let  us  investigate  some  simple  properties 
of  the  omega-value.   From  the  definition  or  from  the  previous  theorem  it 
is  immediately  obvious  that 


\l(0)    =  min  max  a(i,j)  , 
jeJ  iel 


fi(l)  =  max  min  a(i,j) 
iel  jeJ 


Also,  it  is  easy  to  see  that  if  the  matrix  A   is  reduced  by  successively 
eliminating  dominated  rows  and  columns  the  omega-value  fi(a)   is  not 
affected.   Another  obvious  property  is  that  if  the  entries  a(i,j)   are 
all  multiplied  by  a  positive  constand  a  and/or  an  arbitrary  constant 
3   is  added  to  all  of  them  the  omega-value  changes  accordingly  while  the 
optimal  response  strategies  remain  unchanged.   Some  less  obvious  properties 
of  the  omega-value  are  given  in  the  following  theorem. 

Theorem  3:   The  omega-value  ^(a)   of  a  MD  game  is  a  continuous,  non- 
increasing  and  finite  piece-wise  linear  function  of  a   e  [0,1]  . 


Proof;   Let   a  e[0,l]  ,  C   e[0,l]  .   By  Theorem  2 


fi(a..)  -  Q(o    )\    =    |lim[  min  max  ...   min  max 

-Jl     1      Jn     n 


1  v 

—     )    cafi   ,,i  )  +(l-an)a(i  ,j  )  -  min  max 

n  Ln  1   v-l'Jv'   v   1    v,Jv' 

V=l  nneJ   inel 

Jl     1 


1   r 
•  •  •   min  max  —  )  a.a(i   . ,i  )+(l-0_)a(i  ,i  )] 

j  eJ   i  el   v=l 
Jn     n 
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n 


<   lim        min     max    ...    min     max     —     V   ana(i      ,  ,i    ) 

n       L        1  V-l       V 

n-**>     jneJ      inel        i    eJ      i  el       v=l 
Jl  1  Jn  n 


+(l-a1)a(i    ,1    )    -  min     max    ...    min  max 
1           V     V 

j    eJ      i  el        j    eJ  i  el 

11               n  n 


1     n 

-     J  a„a(i     ,,j    )   +   (l-an)a(i   ,   j    ) 
n     A,    2       v-l,Jv  2  v'    Jv 

v=l 


n 


<    lim     max     max    ...      max     max     —     V         (o,-0„) 

n-**>      i-eJ    inel  i    eJ      i   el        v=l 

11  Jn  n 


a(i      .  ,j    )   +    (a   -Gn)a(i    ,i    ) 
v-l     v  2      1  v     v 


<   2     |  A    |ai-Q„  I  ,   where  again   |  A    =  max  max   a(i,j) 

iel  jeJ 


Hence  Sl(o)      is  continuous  in  ae[0,l]  . 
Next  let 

(iljl'  ••"  W 
be  a  finite  sequence  of  distinct  row  and  column  indices  such  that 

jv  =  «*<V  '   Vl  =  P*(V  .  V  -  L.—n  .   im+i  "  ix  . 
where   p*eP  and   q*eQ.  satisfy   (2.7)  .   Clearly,  such  a  sequence  exists 
for  every  ae[0,l]   and  may  be  called  an  optimal  cycle.   By   (2.7)   we  have 


x   +  wo  =  aa(i  ,j  )  +  y    ,  (2.14a) 

v  Jv 

y   +  oj(I-o)  =  (l-O)a(i    ,j  )  +  x      ,  v  =  l,...,m        (2.14b) 
Jv  v+1 
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Adding  these  equations  and  using  the  fact  that   x.    =  x..   we  obtain 

m+1 

m  m 

mu)  =  a  J"  a(i  ,j  )  +  (1-a)  Y  a  (i  .,  ,j  )  .  (2.15) 

L .,    V  Jv  L.         v+1   v 

v=l  v=l 


Since  an  optimal  cycle  exists  for  every   ae[0,l]   and  since  there  is  only 
a  finite  number  of  possible  optimal  cycles  continuity  of   fi(a)   implies 
that  there  must  be  a  finite  partition 

0  =   O      <  O-    <  ..  .  <  0   =1 
o    1         p 

of  the  interval   [o,l]   such  that  in  each  interval   [<J    ,0,  ]   the  omega- 
value  is  a  linear  function  of  a  ,   in  particular 

m  m 

fi(a)   =  -     J  a(i   ,j    )  +  —     y  a(i    in  , j    )    ,     a,    .    <  a  <  a,     .        (2.16) 
y        m     L-     v   v,JVy  m        ^     v   V+l'JVy    '         k-1  k 

V=l  v=l 

Finally,  since  by   (2.6b) 


y  +  co (1-a)  >  (l-a)a(k,j)  +  xfc 


for  every   kel  ,   jcJ  we  have  by  setting  j  =  j  ,   k  =  i 


y   +  co(l-a)  >  (l-a)a(i  j  )  +  x    ,  V  =  1 m  , 

Jv  v 


from  which  by  adding  these  inequalities  to  the  m   equations   (2.14a) 

we  obtain 

m  m 

mw  >  a  I     a(i  ,J  )  +  (1-a)  \   a(±v,Jv)  .  (2.17) 

v=l  v=l 

Comparing   (2.17)   with   (2.15)   we  see  that 
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m  m 

I  a(iv,jv)  <J  a(iv+1,jv)  , 


and  hence  by   (2.16)   £2  (a    )  >  tt(o    )    .   Thus  the  function  fi(a)   is 
non-increasing  and  the  theorem  is  proved. 
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3.      Ergodic    Pairs    and   a  Method    for   Solving  MDG's 

Given  any  starting  point  and  any  pair  of  stationary  response 
strategies,  the  (row .column)  pair  will  eventually  repeat  itself,  since  there 
are  only  finitely  many  such  pairs,  and  a  minimal  cycle  of  such  pairs  will 
then  repeat  over  and  over.   The  payoff  to  player  I  will  be  the  average 
payoff  over  that  cycle  (2.16).   If  the  cycle  turns  out  not  to  depend  on  the 
initial  row  or  column,  the  pair  of  stationary  response  strategies  will  be 
termed  ergodic ,  and  such  a  pair  will  be  referred  to  as  an  ESP.   An  optimal 
pair  of  stationary  response  strategies  will  be  an  OSP  for  whatever  values 
of  a   the  pair  is  optimal.   Note  that  whether  or  not  a  pair  of  strategies 
is  an  ESP  has  nothing  to  do  with  the  payoff  matrix  or   a. 

Since   the      £2-value     does   not   depend   on   the   starting  point,    an  OSP 
must  be   an  ESP,   except   when  an   OSP  has   multiple    cycles   each   of  which  has    an 
average  payoff    (formula  2.16)    of      £2 (a).      Such  OSP's   will   turn  out    to  be   of 
some   importance    to  our  method,   but    let   us    ignore    them  for   the  moment,    and 
attempt    to  solve   for    the      £2-value      (we  want   the  entire    function     &}(•),      not 
£2(a)      for   some   particular     a)      using  only  ESP's. 

Our  method  will   be    to    first   solve    for      £2(0)  ,      and    then   attempt    to 
find     £2(a)      in   adjoining  intervals   until   finally  we    find  an   interval   that 

includes     a   =   1.      When     a   =  0,      £2(0)    =  min     max  a(i,j),      p    (j)      is    a  row 

th  jfJ     ifI  * 

with    the    largest    payoff   in    the      j —     column,    and      q    (i)    =  j         is    a  min   max 

*   *  *   * 

column.   (p  ,q  )   is  an  ESP,  since  the  only  possible  cycle  with   (p  ,q  )   is 

(P  (j  )  >j  )  >  (P  (j  )  »j  )i-   Furthermore,   (p  ,q  )   will  be  an  OSP  in  some 
maximal  interval   [0,o  ],  where  possibly  a     =  0.   In  order  to  find  o  , 
we  must  solve  equations  (2.7)  for  x,  y,   and  to  as  functions  of  a,   after 
which  a   will  be  the  largest  value  of   a   for  which  (2.6)  holds. 
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Before  solving  (2.7),  we  first  note  that  an  arbitrary  function  of 
a   can  always  be  added  to  both  sides  of  2.6b  and  2.7b.   In  order  to  obtain 

linear  expressions,  it  is  convenient  to  let  this  function  be  ooa .   If  we 

i 

then   let      x.    =x.    +  ojo  ,      2.6   and   2.7  become 
l  i 


x     =  min[a   a(i,j)   +  y    1,    if  I  (3.6a) 

j6J  2 


y  .   +  oj  =  maxIQra)    a(i,j)   +  x.],    jfJ  (3.6b) 

2  ifl  X 


x.    =   a   a(i,q    (i))    +  yq*(i)      ±^j  (3.7a) 


*  1 

y  .   +  oj  =    (l-o)    a(p    (j),j)   +  x     *(  6J  (3.7b) 


In  the  following,  we  will  drop  the     on   x. ,   and  will  not  refer 
again  to  (2.6)  and  (2.7). 

*  *  _  _ 

Lemma:   Let   (p  ,q  )   be  any  ESP.   Then  the  solution  of  (3.7)  for   x,  y,  oj 

as  functions  of   a   is  unique  for  oj  ,   and  unique  for  x,  y  except 

that  one  x.   or  y .   can  be  an  arbitrary  function  of  a.   If  the 

i  3 

arbitrary    function   is    linear,   so   are     x     and     y.      The  quantity     oj 
is    a   linear   function   of     a      in   any   case. 


Proof:      Let    the   unique   cycle  be      (i- ,1,  .. . .  ,i    ,i    ).      Let      x.      =  cz(a)  ,      an 

n  '  11  m  Jm  i. 


arbitrary   function.       (3.7a)    then   defines     y.     ,      after  which    (3.7b) 

Jl 
defines      x.     ,      etc.      Using   this    procedure,   we    can   define      x(y) 

X2 
numbers    for  each    row    (column)    in    the    cycle.      Furthermore,    (3.7b) 


with      i    =    i        will   be   satisfied   if      oj      is    obtained    from   formula 
Jm 

(2.16).      This    defines    all      x(y)      numbers    for    rows    (columns)    that 
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appear  in  the  cycle  in  a  manner  consistent  with  (3.7).   Let 
i   be  a  row  that  does  not  appear  in  the  cycle,  and  let 

A  A 

(i1,j1,i2,j2,...,im,jm)   be  such  that   jy  =  q  (iy)  ,  1^  =  p  (jj  , 
and   j    is  the  only  column  in  the  cycle  (such  a   j    will  always 


exist,  on  account  of  the  uniqueness  of  the  cycle).   Since   y. 


••'m 


is  defined,  we  can  use  (3.7a)  to  obtain   x.  ,   then  (3.7b)  to 

obtain   y.    ,   etc.,  until  finally  we  obtain   x.  . 
Jm-1  Xl 

Furthermore,  since  any  other  such  sequence  that  contains   i1   will 

also  contain   j.,.«-,j  ,   there  is  no  danger  of  obtaining  a 

conflicting  definition  of   x.   in  the  process  of  defining  some 

other   x.   or  y..   Similar  remarks  hold  for  columns  that  do  not 
i       J 

appear  in  the  cycle.   Every  x.   and  y.   has  therefore  been 
uniquely  defined  in  a  manner  consistent  with  (3.7).   If  cz(o)    is 
linear,  the  linearity  of  x  and  y   follows  from  the  linearity 
of  (3.7).   Since  (2.16)  is  also  linear,  the  lemma  is  proved. 


We  are  now  ready  to  describe  the  process  of  interval  extension, 

supposing  that  Q.(o)      is  already  known  for  0  £  a  ^  a  .   If  the  OSP   (p  ,q  ) 

* 

at  a  =  a   is  actually  an  ESP,   x.,y.   and   d  will  satisfy  (3.6)  at 

o=a.      Therefore,  unless  there  are  ties  for  the  minimum  in  (3.6a)  or  the 

A 

maximum  in  (3.6b)  when  o   =  o    ,      the  same  expressions  will  satisfy  (3.6) 

*        aa  aa    *  *  * 

for  o     £  o  £  o      ,  where  a    >  a  ;  that  is,   (p  ,q  )   is  actually  an 

*  aa  aa 

OSP    for     o     £o^o      .      The   quantity     a  will  be   the   smallest     a      for 

which    there   is    a   tie.      For  definiteness ,    suppose   there   is    a   tie   in    (3.6a), 
so    that 


£  £  AAA 

t±  =  a      a(i,j)   +  y      =  a      a(i,q    (i))   +  yq*(i) 
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*  ** 

where  j  r  q  (i) .   We  form  a  new  response  strategy   q   (•)   by  defining 

**  **       *  ,  *   ** 

q   (i)  =  j,   and   q   (k)  =  q  (k)   for  k  4   i-   Then   (p  ,q   )   is  an  OSP 

**  *   ** 

at  0=0      .   If  it  turns  out  that   (p  ,q   )   is  also  an  ESP,  then  we  can 

repeat  the  process  of  solving  (3.7)  for  x.(cr),  y.(a),   and   co(a)  , 
increasing  a  until  another  tie  is  encountered  in  (3.6),  etc. 

Since  we  have  a  starting  point  (an  OSP  that  is  also  an  ESP  at  o  =  0) , 
there  is  some  hope  that  the  process  might  map  out  ft(a)  for  0  ^  a  <.  1  in 
intervals,  with  an  OSP/ESP  for  each  interval.   Two  things  are  clear: 

1)  The  process  is  failsafe  in  the  sense  that  all  answers  are  correct. 

2)  The  process  may  not  provide  an  answer.   This  could  happen  either 

because  the  process  gets  stuck  at  a  certain  a   (multiple  ties 

*   ** 
might  cause  this) ,  or  because   (p  ,q   )   is  at  some  stage  not 

and  ESP. 

Computational  experience  with  matrices  chosen  at  random  has  shown  that  the 
process  will  always  (?)  provide  the  answer  for  (3  x  3)  matrices,  but  that 
it  will  sometimes  fail  on  (9  x  9)  matrices  and  will  nearly  always  fail  on 
(18  x  18)  matrices.   When  it  fails,  it  fails  because  it  discovers  an  OSP 
that  is  not  an  ESP. 

This  reason  for  failure  is  somewhat  suprising.   If  an  OSP  that  is 
not  an  ESP  is  to  hold  over  an  interval,  then  (2.16)  must  be  the  same 
function  of  o      for  two  or  more  disjoint  cycles.   If  the  a(i,j)   are  chosen 
at  random,  the  probability  of  this  is  0.   In  fact,  the  probability  is   0 
that  there  could  be  an  OSP  with  three  or  more  cycles  for  any  value  of  a. 
However,  the  probability  is  not  zero  that  there  can  exist  particular 
values  of   a   at  which  an  OSP  has  exactly  two  disjoint  cycles,  and  it  is 
this  type  of  OSP  that  the  above  procedure  tends  to  discover.   The  problem, 
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*   ** 


then,  is  this:   Given  ESP,   that  is  an  OSP  over  some  interval   [c  ,o   ] 


** 


an 


d   OSP    at  a  that  is  not  an  ESP,  how  can  we  discover  an   ESP„   that 


**   ***  ***    ** 

is  OSP  over  [a      ,o        ]   where  o     >  a      ? 

From  ESP  ,   we  can  obtain  numbers   x.,y.,   and   ui   that  satisfy 
(3.6)  at  a  =  a   .   There  is  a  tie  in  one  of  the  equations  (3.6),  which 
we  assume  for  convenience  to  be  in  (3.6a): 


x  =  a   a(i,j  )  +  y   =  a      a(i,j„)  +  y   , 


where 


j1  =  q  (i)   and  j2  =  q   (i)  , 


ESP1  =  (p  ,q  ),   and   OSP;L  =  (p  ,q   ) 


We  assume  OSP..   has  exactly  two  cycles,  one  of  which  must  be  the  ESP. 
cycle.   The  rows  and  columns  can  be  partitioned  into   S..   and   S_,   where 
S..   includes  all  rows  and  columns  in  the  ESP   cycle,  plus  all  those  rows 
and  columns  that  OSP   maps  into  the  ESP   cycle,  and   S    is  defined 
similarly  for  the  other  cycle.   Define  x(6)   and  y(6)   by 


x.(6)  = 


yj(6)  = 


x.   if   ifS, 
i        1 


x.  -   5   if  i£S„ 
l  2 


Yj   if   JfS1 


y     -   6   if  jCS2 
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Since  (3.7)  with  OSP,   never  compares  a  row  in   S   with  a  column  in   S  , 
or  a  row  in   S„   with  a  column  in   S  ,   x(6)  ,y(6)   and   OSP   will  solve 
(3.7)  regardless  of   6.   Let   6   be  the  largest  6   such  that  x(6),y(6) 
satisfy  (3.6).   Since  we  have  assumed  there  is  only  one  tie  when   6=0, 
6  >  0.   For   0  <  6  <  6,   there  are  no  ties.   When   6  =  6,   there  is  at 
least  one  tie,  and  we  assume  there  is  exactly  one.   Resolving  the  tie  one 
way,  we  obtain  OSP  .   Resolving  the  tie  the  other  way,  we  obtain  0SP9 
different  from  OSP    and   ESP  .   If   0SP2   is  actually   ESP„,   we  re-solve 
(3.7)  and  proceed  with  interval  extension.   If  0SP~   is  not  ESP,  then  with 
probability  one  it  has  the  same  two  cycles  as   OSP  ,   since  otherwise  there 
would  be  three  distinct  cycles  with  the  same  average  payoff  in  the  sense 
(2.16).   Let   S    be  all  those  rows  and  columns  that  0SP„  maps  to  the 
first  (original  ESP  )   cycle,  and  similarly  for   S~  .   Now  repeat  the 
process  of  subtracting  6   from  every  x.   or  y .   with  subscript  in   S„ 
until  still  a  different  tie  is  revealed,  with  corresponding  OSP.,,   etc. 
Sooner  or  later  a  new  ESP  will  be  discovered,  and  the  basic  process  of 
interval  extension  can  continue. 

The  above  is  not  intended  to  be  a  proof  that  the  procedure  will 
work,  but  only  as  an  explanation  of  the  process  used  by  a  FORTRAN  program 
called  MATDIF  to  solve  differential  games.   MATDIF  is  failsafe,  in  the 
sense  that  it  deals  only  with  solutions  of  (3.6)  and  (3.7),  but  it  has  not 
been  proved  that  MATDIF  will  always  provide  an  answer  for  all  of [0,1]. 
However,  MATDIF  has  not  failed  to  produce  an  answer  for  any  of  the  ap- 
proximately 100  test  matrices  with  elements  chosen  to  be  uniform  random 
numbers  between  0  and  100.   The  program  compiles  in  about  10  sees  on  the 
NPS  IBM360/67,  and  the  run  time  is  approximately  120(M/50)  '  (N/50) 
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seconds  (see  Table  1).   The  program  is  available  from  Washburn.   In  the 
future,  a  proof  that  MATDIF  or  a  procedure  modified  to  account  for 
degeneracies  will  always  provide  an  answer  will  be  provided. 
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T(M,N)  .  120|^)1-5(f  J1'' 


N.       N 

M   \. 

9 

18 

50 

. 

. 6  seconds 

1.5  seconds 

i 

9 

. 

(20  runs) 

(30  runs) 

i 

1.5  seconds 

4.5  seconds 

i 

18 

(30  runs) 

(  2  runs) 

10  seconds 

120  seconds 

50 

(10  runs) 

(3  runs) 

i 

Table  1    Average  run  times  for  MATDIF  in  seconds 
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Example:   Figure  1  shows  Q(o)     for  a  typical  18  x  18  game.   There  are  20 
distinct  linear  segments  in  this  case.   MATDIF  actually  considers  70 
different  ESP's,  but  many  of  the  (distinct)  ESP ' s  have  the  same  cycle. 
The  solution  proceeds  by  interval  extension  up  to  a  =  a,  =    .53169.   The 
ESP,   that  is  OSP  at   a,   and  for  slightly  smaller  values  is  illustrated 
in  figure  2;  each  column  has  an  "x"  corresponding  to  I's  choice  in  the 
column,  and  each  row  has  an  "o"  corresponding  to  II's  choice  in  the  row. 
So  x's  move  horizontally  to  o's,  and  o's  move  vertically  to  x's.   The  only 
cycle  is  shown  solid  in  the  figure. 

The  OSP   following  ESP..   is  identical  to  ESP,   except  that  the 
0  in  row  16  is  moved  from  column  11  to  column  2.   A  new  cycle  forms,  shown 
as  a  dashed  line.   Both  cycles  have  the  same  average  payoff.   The  reader 
might  amuse  himself  by  delineating   S..   and   S?.   0SP-   is  formed  by  moving 
the  0  in  row  6  from  column  3  to  column  6,  and  still  has  the  same  two  cycles 
OSP    is  formed  by  moving  the  x  in  column  16  from  row  16  to  row  10,  and 
also  has  the  same  two  cycles.   Finally,   OSP,  is  formed  by  moving  the  0  in 
row  5  from  column  17  to  column  8,  leaving  only  the  dashed  cycle  and  hence 
a  new  ESP.   This  ESP  is  valid  for  o   £  a  <.   .5402,   and  only  ESP's  are 
encountered  for  larger  a.   The  run  time  for  solving  this  game  is  4.12 
seconds . 
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FIGURE  1 


ft (a)   for  an  18  x  18  game 
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FIGURE    2 

Illustration  of  cycles  at  a  =  .53169 
with  same  game  as  in  FIGURE  1. 


REFERENCES 

[1]   John  M.  Danskin,  Jr.,  private  communication,  1973. 

[2]   Values  in  Differential  Games,  John  M.  Danskin,  Jr.,  to  be  published. 

[3]   R.  Elliott,  A.  Friedman,  and  N.  Kalton,  "Alternate  Play  in  Differ- 
ential Games,"  to  appear 

[4]   Differential  Games,  Avner  Friedman,  Wiley,  N.Y. 

[5]   D.  Gillette,  "Stochastic  Games  with  Zero  Stop  Probabilities," 
Contributions  to  the  Theory  of  Games,  vol.  Ill,  M.  Dresher, 
A.W.  Tucker,  P.  Wolfe,  eds.,  Princeton  U.  Press,  Princeton,  1957, 
pp.  179-187. 

[6]  T.  Liggett  and  S.  Lippman,  "Stochastic  Games  with  Perfect  Information 
and  Time  Average  Payoff,"  SIAM  Review,  Vol.  II,  No.  4,  Oct.  1969,  pp. 
604-607. 


25 


INITIAL  DISTRIBUTION  LIST 

No.  of  Copies 

Defense  Documentation  Center 

Cameron  Station 

Alexandria,  Virginia  22314  12 

Library  (Code  0212) 

Naval  Postgraduate  School 

Monterey,  California  93940  2 

Dean  of  Research 

Code  023 

Naval  Postgraduate  School  1 

Library  (Code  55) 

Department  of  Operations  Research 

and  Administrative  Sciences 
Naval  Postgraduate  School 
Monterey,  California  93940  2 

Dr.  John  M.  Danskin 

University  of  California 

College  of  Engineering 

Electronics  Res.  Lab. 

Berkeley,  California  94720  1 

Dr.  Bruno  0.  Shubert 

Department  of  Operations  Research 

and  Administrative  Sciences 
Naval  Postgraduate  School 
Monterey,  California  93940  10 

Dr.  Alan  R.  Washburn 

Department  of  Operations  Research 

and  Administrative  Sciences 
Naval  Postgraduate  School 
Monterey,  California  93940  10 

Professor  Avner  Friedman 

Northwestern  University 

Evanston,  Illinois  60201  1 

Dr.  L.  S.  Shapley 

RAND  Corporation 

1700  Main  Street 

Santa  Monica,  California  90406  1 


26 


U161952 


DUDLEY  KNOX  LIBRARY  -  RESEARCH  REPORTS 


n  in  nun  mi  mil  i  hi;!  i!  mi  in  urn 
5  6853  01060518  1 


